{smcl}
{* 4 Mar 2002}{...}
{hline}
help for {hi:split}
{hline}

{title:Splitting string variables into parts} 

{p 8 12}{cmd:split} 
{it:strvar}
[{cmd:if} {it:exp}] 
[{cmd:in} {it:range}] 
[, 
{cmdab:g:enerate(}{it:stub}{cmd:)}
{cmdab:p:arse(}{it:parse_strings}{cmd:)}  
{cmdab:l:imit(}{it:#}{cmd:)}  
[{cmdab:no:}]{cmdab:t:rim} 
{cmd:destring} 
{it:destring_options} 
]

{title:Description} 

{p}{cmd:split} splits the contents of a string variable {it:strvar} into one or
more parts, using one or more {it:parse_strings} (by default blank space(s)),
so that new string variables are generated.  It is thus useful for separating
`words' or parts of a string variable. {it:strvar} itself is not modified. 
 
 
{title:Options} 
 
{p 0 4}{cmd:generate(}{it:stub}{cmd:)} specifies the beginning characters of
the new variable names, so that new variables {it:stub}{cmd:1}, {it:stub}{cmd:2},
etc., are produced. {it:stub} defaults to {it:strvar}. 
 
{p 0 4}{cmd:parse(}{it:parse_strings}{cmd:)} specifies that, instead of spaces,
parsing should be done using one or more {it:parse_strings}. Most commonly, one
string which is a single punctuation character will be specified.  For example,
if {cmd:parse(,)} is specified, then {cmd:{bind:"1,2,3"}} is split into
{cmd:"1"}, {cmd:"2"} and {cmd:"3"}. 
 
{p 4 4}It is also possible to specify (1) two or more strings which are
alternative separators of `words' and/or (2) strings which consist of two or
more characters.  Alternative strings should be separated by spaces and strings
which include spaces should be bound by {cmd:{bind:" "}}. Thus if
{cmd:{bind:parse(, " ")}} is specified, then {cmd:{bind:"1,2 3"}} is also split
into {cmd:"1"}, {cmd:"2"} and {cmd:"3"}.  Note particularly the difference
between (say) {cmd:{bind:parse(a b)}} and {cmd:parse(ab)}: with the first,
{cmd:"a"} and {cmd:"b"} are both acceptable as separators, while with the
second, only the string {cmd:"ab"} is acceptable.

{p 0 4}{cmd:limit(}{it:#}{cmd:)} specifies an upper limit to the 
number of new variables to be created. Thus {cmd:limit(2)} specifies that 
at most two new variables should be created. 
 
{p 0 4}{cmd:notrim} specifies that the original string variable should not be
trimmed of leading and trailing spaces before being parsed. {cmd:trim} is the
default. 
 
{p 0 4}{cmd:destring} applies {help destring} to the new string variables, 
replacing the variables initially created as string by numeric variables
where possible. 
 
{p 0 4}{it:destring_options} qualify the application of {cmd:destring}. 
Possible options are {cmd:float}, {cmd:force}, {cmd:ignore()} and 
{cmd:percent}. For details, see {help destring}. 
 
 
{title:Examples} 
 
{p}1. Suppose that input is somehow misread as one string variable, say when
you copy and paste into the data editor, but data are space-separated:
 
{p 4 8}{inp:. split var1, destring}

{p}2. Suppose a string variable holds names of legal cases which should be split
into variables for plaintiff and defendant. The separators could be
{inp:{bind:" V "}}, {inp:{bind:" V. "}}, {inp:{bind:" VS "}} and 
{inp:{bind:" VS. "}}.  Note particularly the leading and trailing spaces: 
{inp:"V"}, for example, would incorrectly split {inp:{bind:"GOLIATH V DAVID"}}.
 
{p 4 8}{inp:. split case, p(" V " " V. " " VS " " VS. ")}

{p}Signs of problems would be the creation of more than two 
variables and any variable having blank values, so check: 

{p 4 8}{inp:. list case2 if case2 == ""} 

{p}3. Suppose a string variable holds time of day in the form "hh:mm:ss", 
e.g. {inp:"12:34:56"}. 

{p 4 8}{inp:. split hms, p(:) destring}{p_end} 
{p 4 8}{inp:. gen timeofday = hms1 + hms2 / 60 + hms3 / 3600} 

{p}Or suppose a string variable holds time of day in the form 
{bind:"hh:mm:ss am"} or {bind:"hh:mm:ss pm"},
e.g. {inp:"06:54:32 am"}, "{inp:11:22:33 pm}". 

{p 4 8}{inp:. split hms, p(: " ") destring}{p_end} 
{p 4 8}{inp:. gen timeofday = hms1 + hms2 / 60 + hms3 / 3600 + 12 * (hms4 == "pm")} 

{p}4. Email addresses split at {inp:"@"}: 

{p 4 8}{inp:. split address, p(@)} 


{title:Author} 

         Nicholas J. Cox, University of Durham, U.K.
         n.j.cox@durham.ac.uk


{title:Acknowledgement}

{p}This program has benefitted substantially from the work of Michael
Blasnik on an earlier jointly written program. Ken Higbee made very
useful comments. 


{title:Also see}

On-line: help for {help destring}, {help egen} ({cmd:ends()}) 
 Manual: {hi:[R] destring}, {hi:[R] egen} 


